Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

optimize bandpower window integration #80

Merged
merged 1 commit into from
Sep 4, 2024
Merged

optimize bandpower window integration #80

merged 1 commit into from
Sep 4, 2024

Conversation

cmbant
Copy link
Collaborator

@cmbant cmbant commented Jul 3, 2024

At least for the moment, the weights are mostly zero since the bandpowers localized, so all those elements of the matrix multiplication are not needed. This refactor speeds it up by about a factor of about 8 on my laptop, hopefully enough to make dragging sampling useful. It could of course be further improved by putting the whole operation into numba with openmp.

Now the remaining computing time is split roughly equally between _get_foreground_model and the rest. To see where time was spent I did

from io import StringIO
import pstats
from cProfile import Profile
prof = Profile()
prof.enable()
nuis_params['a_tSZ']*=1.00001 # make sure no cache on fast params
loglikes, derived = model.loglikes(nuis_params)
prof.disable()
s = StringIO()
ps = pstats.Stats(prof, stream=s)
ps.strip_dirs()
ps.sort_stats('time')
ps.print_stats(30)
ps.sort_stats('cumtime')
ps.print_stats(30)
print(s.getvalue())

@codecov-commenter
Copy link

Codecov Report

All modified and coverable lines are covered by tests ✅

Please upload report for BASE (master@660e0fa). Learn more about missing BASE report.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff            @@
##             master      #80   +/-   ##
=========================================
  Coverage          ?   87.59%           
=========================================
  Files             ?        3           
  Lines             ?      395           
  Branches          ?        0           
=========================================
  Hits              ?      346           
  Misses            ?       49           
  Partials          ?        0           
Files Coverage Δ
mflike/mflike.py 90.40% <100.00%> (ø)

@xgarrido
Copy link
Collaborator

xgarrido commented Jul 5, 2024

Really looks good to me. I did the time benchmark as suggested and I'm getting same time factors.

One remark a bit uncorrelated to this PR, I had a look into the _fast_chi_square function of cobaya which, in most of the time, fall down to camb.mathutils.chi_squared (in case camb is installed) but in case camb is not installed the default way to compute $\chi^2$ is https://github.com/CobayaSampler/cobaya/blob/master/cobaya/likelihoods/base_classes/DataSetLikelihood.py#L27 Regarding time benchmark, we noticed sometimes ago that dot product was slightly slower than @ product introduced in python 3.5. So we might save few microseconds in case someone use mflike (or any likelihood relying to _fast_chi_square) without camb installed.

@cmbant
Copy link
Collaborator Author

cmbant commented Jul 5, 2024

Interesting, I had assumed they mapped to identical code. If I do this in jupyter (for n=9000) if anything the .dot form seems to be faster for me

timeit v @ cov @ v
%timeit cov.dot(v).dot(v)
%timeit v @ cov @ v
%timeit cov.dot(v).dot(v)

11.4 ms ± 222 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
10.4 ms ± 63.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
11.2 ms ± 154 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
10.6 ms ± 196 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

CAMB's gets the extra factor of 2 from symmetry.

@xgarrido
Copy link
Collaborator

xgarrido commented Jul 5, 2024

I also thought they map to the same piece of code. In your test, the cov matrix is very sparse (so the purpose of this PR). If you do the test on the same 9000x9000 matrix with random numbers, then the @ is much faster than dot. I'm not sure I understand the difference between dot and matmul which corresponds to the @ implementation https://numpy.org/doc/stable/reference/generated/numpy.matmul.html#numpy.matmul

@cmbant
Copy link
Collaborator Author

cmbant commented Jul 5, 2024

odd, I see the same with random numbers and matmul or @ (Maybe it changed in numpy 2 or depends on threading?)

@cmbant cmbant mentioned this pull request Jul 26, 2024
@cmbant cmbant merged commit fd46540 into master Sep 4, 2024
11 checks passed
@cmbant cmbant deleted the optimize branch September 4, 2024 07:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants